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Abstract 

We revisit the problem of computing Frechet distance between polygonal curves under Li, 
L 2 , and Loo norms, focusing on discrete Frechet distance, where only distance between vertices 
is considered. We develop efficient algorithms for two natural classes of curves. In particular, 
given two polygonal curves of n vertices each, a e-approximation of their discrete Frechet distance 
can be computed in roughly 0{nK^logn/s^) time in three dimensions, if one of the curves is 
K-bounded. Previously, only a ^-approximation algorithm was known. If both curves are the so- 
called backbone curves, which are widely used to model protein backbones in molecular biology, 
we can e-approximate their Frechet distance in near linear time in two dimensions, and in roughly 
log nm) time in three dimensions. In the second part, we propose a pseudo-output-sensitive 
algorithm for computing Frechet distance exactly. The complexity of the algorithm is a function of 
a quantity we call the number of switching cells, which is quadratic in the worst case, but tends to 
be much smaller in practice. 


1 Introduction 

Frechet metric is a natural measure of similarity between two curves [EGH+02] . An intuitive definition 
of the Frechet distance is to imagine that a dog and its handler are walking on their respective curves. 
Both can control their speed but can only go forward. The Frechet distance of these two curves is the 
minimal length of any leash necessary for the handler and the dog to move from the starting points of 
the two curves to their respective endpoints. Frechet distance and its variants have been widely used 
in many applications such as in dynamic time-warping [KP99], speech recognition [KHM’''98], signature 
verihcation [PP90], and matching of time series in databases [KKS05]. 

*A preliminary version of this paper appeared in ESA 2006 [AHK+06]. 
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Alt et al. [AG95] present an algorithm to compnte the Frechet distance between two polygonal 
cnrves of n and m vertices, respectively, in time 0{nm\og^{nm)). Improving this ronghly qnadratic- 
time solntion for general cnrves seems to be hard, and so far, no algorithm, exact or approximate, with 
rnnning time signihcantly smaller than 0{nm) has been found for this problem for general curves.Since 
the Frechet distance essentially requires computing a correspondence between the two curves, it has 
some resemblance to the edit distance problem (which asks for the best alignment of two strings), for 
which no substantially subquadratic algorithm is known either. 

On the other hand, another similarity measure, the Hausdorjf distance, can be computed faster in 
the plane and approximated efficiently in higher dimensions. Unfortunately, Hausdorff distance does 
not reflect curve similarity well (see Figure 1 (a) for an example). Alt et al. [AKW04] showed that 
the Hausdorff distance and the Frechet distance are the same for a pair of closed convex curves. They 
also showed that the two measures are closely related for K-bounded curves. Roughly speaking, for any 
two points p,q on a. ^-bounded curve r, T{p,q), the subcurve from p to q, is contained within some 
neighborhood of p and q with size roughly n\\p — q\\ (see Figure 1 (b); precise dehnition is introduced 
later). Alt et al. showed that the Frechet distance between any two ^-bounded curves is bounded by 
K + 1 times the Hausdorff distance between them. This leads to a ^-approximation algorithm for the 
Frechet distance for any pair of ^-bounded curves, and they also developed an algorithm to compute 
the reparametrizations of input curves that realize this approximation (see the dehnitions below). The 
algorithm runs is 0((?7,-|-m) log^(n-|-m)2"’^’^’''’")) time in two dimensions. In three or higher dimensions, 
the time complexity is dominated by the computation of Hausdorff distance between the curves. Not 
much is known about the Frechet distance for other types of curves. In fact, even for x-monotone curves 
in three dimensions, no known algorithm runs in substantially subquadratic time. 

The problem of minimizing Frechet distance under various classes of transformations has also been 
studied [AKWOl, Wen02]. However, even in two dimensions, the exact algorithm takes roughly 0{n^) 
time for computing the best Frechet distance under translations, and roughly 0(n®) time under rigid 
motions. Approximation algorithms have been studied [CMOS, Wen02], but practical solutions remain 
elusive. The basic building block of those algorithms, as well as one of the bottlenecks, is the computation 
(or approximation) of the Frechet distance between curves vr and a. 

There is a slightly simpler version of the Frechet distance, the discrete Frechet distance, which 
only consider vertices of polygonal curves. Its computation takes 0(n^) time and space using dynamic 
programming [EM94], and no substantially subquadratic algorithm is known either. Frechet distance 
has also been extended to graphs (maps) [AERW03], to piecewise smooth curves [RotOS], to simple 
polygons [BBW06] and to surfaces [AB05]. Finally, Frechet distance was used as the similarity measure 
for morphing [EGHMOl] between curves, and for high-dimensional approximate nearest neighbor search 
[Ind02]. It was also used for efficient curve simplihcation [AHMW05]. 

Our results. Given the apparent difficulty of improving the worst-case time complexity of computing 
the Frechet distance between two unrestricted polygonal curves, we aim at developing algorithms for 
more realistic cases. First, in Section 3, we consider efficient approximation algorithms for the slightly 
simpler variant of Frechet distance, the discrete Frechet distance, the best algorithms for which currently 
have only slightly better worst-case time complexity than the continuous case. Most currently algorithms 
for computing Frechet distance rely on a so-called decision procedure which determines whether a given 
distance is larger or smaller than the Frechet distance between the two given curves. We observe that an 
approximation solution to the decision problem can lead to an approximation of Frechet distance, and 
curve simplihcation can help us to approximate the decision problem efficiently. We apply this idea for 
two families of common curves. In the hrst case, given two polygonal curves of size n and m respectively. 
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(a) (b) (c) 



Figure 1: (a) Light and dark curves are close under Hausdorff but far under Frechet distance, (b) vr is 
n-bounded iff for any p, g € tt, subchain 7r(p, q) lies inside the shaded region, where the radius of two 
disks (centered at p and q) is Kd(p, g)/2. (b) The dashed curve /i-simpli£es the solid one; the radius of 
each disk is p. 

with one of them being ^-bounded, we can ^-approximate their discrete Frechet distance in 0{{m + 
/e'^) log{nm)) time in d-dimensions. In the second case, both curves are so-called backbone curves, 
used widely to model molecular structures like protein backbones, and DNA/RNAs. We e-approximate 
their (both discrete and continuous) Frechet distance in near linear time in two dimensions, and in 
roughly log nm) time in three dimensions. 

In Section 4, we shift our focus back to the exact computation of discrete Frechet distance. Previously, 
the problem of deciding whether Frechet distance was smaller than some threshold was cast as hnding 
some viable path in the so-called free-space diagram which is a n x m map. We observe that such 
viable path can be computed once some subset § of cells in the free-space diagram are given. The size 
of § is nm in worst case, but is expected to be much smaller in general. Based on this observation, 
we present algorithms that run in 0(|§| -|- nlog'^“^n) time for discrete Frechet distance in norm 
in d-dimensions. For L 2 norm, it takes roughly 0(|§| -|- polylogn) time in two dimensions, and 
0(|§| + polylogn) time for d > 2. 


2 Preliminaries 

A (parameterized) curve in can be represented as a function /: [0,1] —>■ M'^. A (monotone) reparametriza- 
tion a is a continuous non-decreasing function a: [0,1] —>■ [0,1] with q;(0) = 0 and a(l) = 1. Given two 
curves f,g : [0,1] —)■ the Frechet distance between them, 6F{f,g), is dehned as 

^f(/, g) ■= inf niax d{f{a{t)),g{f3{t))). 
a,fS 4e[0,l] 

where d(a:, y) denotes the Euclidean distance between points x and y, and a and (3 range over all 
monotone reparametrizations. 

Discrete Frechet Distance. A simpler variant of the Frechet distance for two polygonal curves 
71 = {pi,P 2 , ■ ■ ■ ,Pn) and a = {qi, q' 2 ,..., qm) is the discrete Frechet distance, denoted by dD(7J', o'). Imagine 
that both the dog and its handler can only stop at vertices of vr and a, and at any step, each of them 
can either stay at their current vertex or jump to the next one (i.e., magically, both the dog and the 
handler seem to have turned into frog princess and prince, respectively). The discrete Frechet distance 
is dehned as the minimal leash necessary at these discrete moments. 

To formally dehne the discrete Frechet distance, we hrst consider a discrete analog of continuous 
reparametrizations. A discrete monotone reparametrization a from {!,..., fc} to is a non¬ 
decreasing function a : {1,..., /c} —>■ for integers fc > ^ > 1, with a(l) = 1, a{k) = i and 
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a{i + l) < a{i) + l, for alH = 1,..., k — 1. An (order-preserving complete) correspondence between n and 
(T is a pair (a, (3) of discrete monotone reparametrizations from {1,..., A;} to {1,..., m} and n}. 

The discrete Frechet distance between vr and a, (5D(7r, cr) is 

^d(/, g) := min niax d{f{a{t)),g{(3{t))), 

(o,/3) te[i,fc] 

where {a, (3) range over all order-preserving complete correspondences between vr and a. An equivalent 
dehnition of order-preserving complete correspondence between vr and a is a set of pairs M C {(p, g) | 
p E 7 i,q E a} such that (i) order-preserving-, if {pi,qj) E M, then no {Ps,gt) E M for s < i and t > j; 
and (ii) complete: for any p E n (resp. q E a), there exists some pair involving p (resp. q) in M. The 
discrete Frechet distance is related to the edit distance between the “strings” vr and a where the cost of 
changing a symbol is the Euclidean distance of the relevant points. 

It is well known that discrete and continuous versions of the Frechet distance relate to each other as 
follows: 

5F(7r, a) < 5D(vr, a) < 6 ^( 71 , a) max{£i, 4}, 

where £i and £2 are the lengths of the longest edges in tt and a, respectively. This suggests using to 
approximate (5 f. Unfortunately, it seems that computing (5D(vr, a) is asymptotically almost as hard as 
computing 5F(7r, a). 

Decision problem. In the original paper, Alt and Godau [AG95] used the following framework to 
compute (5F(7r, cr): First, develop a procedure that answers the following decision problem in Q{nm) 
time and space by a dynamic programming algorithm: Given a parameter 5 > 0, is 5F(vr,(j) < 67 
This procedure is then used as a subroutine to search for 5F(7r,(T) using parametric search paradigm 
within O(nm log^ nm) time[AG95, AST94]. The same paradigm can be used to compute 5D(7r, a) in 
0{nmlog{nm)) time by replacing the parametric search to a binary search. Although this is slightly 
worse than the 0(nm) algorithm in [EM94], we describe how to solve the decision problem for 5D(7r, cr) 
below, as our algorithm will use this framework, and as the algorithm from [EM94] runs in Q{nm) time 
for any input. 



Figure 2: The valid path (solid curve) in the Free-space diagram D{7i, a, 5) in (a) corresponds to the 
order-preserving, complete correspondence (dashed lines) in (b). The path in (c) is not valid, as the 
two solid segments violate the bi-monotonicity condition, (d) The directed graph corresponding to the 
white cells in free-space diagram in (a). 


Given two polygonal chains vr and a and a distance threshold 5 > 0, we construct the following 
free-space diagram D = D(7r,cr, 5); D is an n x m matrix (grid) and a grid cell D[i,i] has value 1 if 
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d(Pi, Qj) < S, and value 0 otherwise. We refer to 1-cells as white and 0-cells as black. The white cells 
in the ith column (resp., jth row) correspond to the set of vertices of a (resp., tt) whose distance to pi 
(resp., qj) is less than S. A viable path in H is a path connecting s := D[l, 1] to t := Zl[n, m], visiting 
only white cells of D, and moving in one step from (*, j) to either {i,j + 1), {i + 1, j), or {i + + 1). 

It is easy to check that a complete order-preserving correspondence M induces a viable path in D and 
vice versa (see Figure 2). Hence the problem of deciding “(5D(vr, cr) < 67” is equivalent to deciding the 
existence of a viable path in D. 

Given D, one can extract a viable path, if it exists, in Q{nm) time by a dynamic programming 
algorithm. Alternatively, one can traverse a directed graph G dehned as follows: The nodes of G are 
the white cells of D. A white cell is connected to its top, right, or top-right neighbor cells by a directed 
edge, if they are white. See Figure 2 (d). The size of G is the bounded by \W\, the number of white cells 
of D, the in-degree (out-degree) of each node is at most three, and (5D(vr, cr) < 5 if and only if there is 
a directed path in G from (1,1) to {n,m). That is, testing this condition corresponds to a connectivity 
check in a directed graph in time (9(|hF|), once the graph G is given. 

Approximations. We say that r is an ^-approximation of (5(7r, a) if 

(1 — e)5{Ti, ct) < r < (1 -|- e)(5(7r, a). 

We say that an algorithm e-approximates the decision problem “Is 6 {n^a) < 67”, if it returns ‘yes’ 
whenever 6 {7i,a) < (1 — e)6 and ‘no’ whenever 6 {7i,a) > (1 -|- e)5. If 5 is a (1 -|- £)-approximation of 
6 {7r, a), the algorithm is allowed to return either ‘yes’ or ‘no.’ Such an algorithm is also called an e-fuzzy 
decision procedure for 6 {7i, a). 

3 Approximation Algorithms Based on Simplification 

In this section, we hrst introduce a general framework for approximating the discrete Frechet distance 
by solving the decision problem approximately. We then present efficient approximation algorithms for 
two families of common curves based on this framework: the ^-bounded curves and the backbone curves, 
using curve simplihcations, packing arguments, and other observations. 

3.1 Approximation via approximate decision problem 

Given a set P of N points in compute a well-separated pairs decomposition (WSPD) of P for a 
separation parameter 10, which is a collection {(A,, Bi)} of pairs of subsets of P, with the property that 
(1) for every pair of points x,y E P, there is an index i, so that x G A, and y E Bi and (2) the minimum 
distance between Aj and Bi is at least 10 times the diameter of either set. One can compute such a 
collection of size 0{N) in 0{N log N) time [GK95]. For every pair (Aj,Hj) in the WSPD, we choose an 
arbitrary pair of points Pi E Ai and Qi E B^ as its representatives. It is easy to check that the distance 
between any two points x,y E P is 1/5-approximated by the distance between the representatives of 
the corresponding WSPD pair. 

If we want to approximately solve an optimization problem using a decision procedure, where the 
optimal solution 6 * is one of the distances induced by a pair of points of P, then we can use the above 
WSPD to extract 0{N) values: for each WSPD pair, we take the distance between its representative 
points. Next, we replace each value x by two values |x and |x, sort the resulting values, and perform a 
binary search (using the decision procedure) to identify which interval delimited by consecutive values 
contains 6 *. Let J = [x,y] be the resulting interval; obviously y < |x. We now perform another binary 
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search on this interval to identify the interval [x',y'] containing 5* with y' < {1 + £)x', giving rise to an 
^-approximation of 6*. The second binary search invokes the decision procedure 0(log(l/£)) times. 

Interestingly, the decision procedure does not have to be exact, and it can return a fuzzy answer, in 
the sense of last section. An equivalent view of an £-fuzzy decision procedure is: for a parameter 6, if 
it returns “no”, then h* < (1 -f e)S-, otherwise if it returns “yes”, then h* > (1 — £)h. It can be shown 
that the above binary search can be adapted to work with a fuzzy decision procedure with the same 
performance guarantees (details omitted and can be found in Appendix A. We summarize: 

Theorem 3.1. Let P be a set of N points in and let X be an optimization problem, for which the 
optimal answer is a distance induced by a pair of points of P. Given an e-fuzzy decision procedure for 
X, one can e-approximate the optimal solution in 

0{N log N + Tfdecision(A^, 1/10) logN + Tfdecision(A^, ejP) log(l/£)) 

time, where Tfdecision(A^a) is running time of the fuzzy decision procedure when the required accuracy 
is e. 

Proof: The algorithm is described above. The fuzzy decision procedure can be used with constant 
accuracy in the stage of the algorithm. Higher accuracy of e/4 is required only at the second stage, 
when we perform the binary search over the interval J. ■ 

On the other hand, observe that there must exist some p* E n and q* E a such that d{p*,q*) = 
In other words, the solution 6* = S]:,{7i,a) will be one of the distances induced by a pair of 
points from P = { vertices from tt and a}. Hence the above theorem implies that we now only need a 
fuzzy decision procedure for (5D(7r, a) in order to approximate 6B{7i,a). 

3.2 Approximation with simplifications 

The remaining question is how to implement fuzzy decision procedure efficiently. One useful heuristic 
is curve simplification. Below we first describe the particular simplification we use and how it helps in 
approximating hD(7J', o')- We then show that together with a packing argument and other observations, 
guaranteed efficiency can be achieved for the two classes of common curves that we investigate. 

Greedy simplification. Given a polygonal chain tt = (pi,...,p„), we simplify vr to obtain tt = 
{pi,... ,pk), where vertices of n form a subsequence of tt, with pi = pi and pk = Pn- More precisely, 
let In{i) = j if Pi = Pj E tt; the subscript vr is omitted when it is clear from context. We say that 
n y-simplifies tt if (i) I{i) < I{k) ioi i < k (i.e, order-preserving), and (ii) d{pi,pk) < y for any 
k E [ I{i),I{i + 1) ) (see Figure 1 (c)). (This definition of /x-simpli£cation is slightly different from the 
standard definition found in the literature.) 

We construct a /i-simplification of vr, T, in a greedy manner: Start with pi = pi. At some stage, 
suppose we have already computed pi = pj. In order to find I{i + 1), we check each vertex of vr 
starting from pj in order, and stop when we reach the first edge PkPk+i of vr such that d{j)j,pk) < y and 
d{j)j,pk+i) > y- We set = Pk+i and proceed until we reach p^, at which point we add Pn as the 
last vertex of vf. The entire procedure takes linear time. By construction, the following observation is 
straightforward. 

Observation 3.2. For any edge PiPi^i in vf, other than the last edge, we have d{pi,pi+i) > y. 

Now if we /x-simplify both input curves vr and a to obtain vf and a, we have the following lemma: 
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Figure 3: Small empty circles mark vertices of ^ and a. Left picture shows part of vr and a and the 
correspondence M* (indicated by dashed segments). Right picture shows if and a (thick curves) and 
the induced correspondence for them (thick dashed segments). 

Lemma 3.3. (5D(7r, cr) — 2/i < (5D(if, ?) < 5D(7r, a) + /x. 

Proof: We hrst consider the right-hand inequality. Let 5* = 5D(7r,cT), and M* be the complete order¬ 
preserving correspondence that produces (5D(7r, cr). Obviously, for any pair {pi,qj) in M*, we have that 
d{pi,qj) < 6*. M* can be modihed into a correspondence M between rf and a as follows: we add the 
match (pi, qj) to M if and only if (Cl) there exists a match {pi^[i), qt) G M* such that b G [Ia{j), -fo-(j+l)), 
or (C2) there exist a match (p^, qi„{j)) G M* such that a G [/ 7 r(*), lTr{i + l)), where (resp. I„) maps the 
indices between vr and rf (resp. a and a) (see Figure 3 for an example). It is easy to verify that M is both 
complete and order-preserving. By the triangle inequality, we have that d{pi,qj) < d{pi,qb) -|- d{qb,qj) 
for case (Cl), implying that d{pi,qj) < d{pi,qb) + p < 6* + p (the case for (C2) is symmetric). Since 
should be smaller than the distance induced by M, the right-hand inequality then follows. 

The proof for the left-hand inequality is similar but slightly more involved. Details omitted and can 
be found in the Appendix B. ■ 

The above lemma implies that if the answer to (5D(7f, a) < 5 is ‘y^s’, then, 5D(7r,cr) < 5 -|- 2p. If 
it is ‘no’, then bBi'n’yCr) > S — p. Thus the decision problem for < 50 ( 7 ?, a) (2/i/5)-approximates that of 
(5D(7r, cr). We next show that ^ 0 ( 7 !, a) can be answered asymptotically much faster for two special classes 
of curves, giving rise to efficient fuzzy decision procedure for them. 

3.3 Prechet Distance for /t-bounded curves 

Given a polygonal curve vr, let 7r(x,y) C n denote the subcurve of n that connects x G vr and p G vr, 
and lT^{x,y) the length of n{x,y) along tt; tt may be omitted from the subscript when clear. We say 
that TT is K- straight if l{x,y) < n ■ d(x,p), for any x,p G vr. Examples of rc-straight curves include 
curves with increasing chords of [Rot94] and self-approaching curves of [AAI’''01]. As dehned by Alt et 
ah [AKW04], TT is ^-bounded if 7i{x,y) C B{x, |d(x,p)) U B{y, |d(x,p)), for all x, p G vr, where B{x,r) 
is the radius-r Euclidean ball centered at x and where we have slightly abused the notation by treating 
a curve section as a point set. See Figure 1 (b) for an illustration in two dimensions. Every ^-straight 
curve is ^-bounded. 

We now describe how to construct an £-fuzzy decision procedure for the problem “(5D(7r,cr) < 5?”, 
where one curve, say a, is ^-bounded. We hrst /x-simplify vr and a into n and a respectively, using 
p := e5/2. By Lemma 3.3, the decision problem for (5 d(^, 5) is an e-fuzzy decision procedure for 
(5D(7r,cr). Hence we now focus on checking whether (5 d(^,5) < d. Let rx, m, r, s be the size of tt, a, T, 
and a respectively; r = 0{n) and s = 0{m). 

Decision problem for 5£)(^, ?)• Let R be the free-space diagram for T and a with respect to 6. 
Recall that (5 d(^, ?) < 5 if there exists a viable path in R which can be computed in 0(|IF|) time once 
W, the set of white cells of R are given. We hrst bound the size of W. 
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For every p G let N{p) be the set of points from a contained in Obviously, |PF| = 

l"^(p)l- Consider any two points qi,q 2 G d that he in B{p, 5) for some p There are two cases: 
(i) qiq 2 is an edge of a and (ii) otherwise. For case (i), we have that d(gi,g 2 ) > by Observation 3.2. 
For case (ii), we know that 


hv 

c^(gi, 92 ) C 5(gi, 2 ^( 51 , 92 )) U B{q2, ^d{qu ga)), 

as a is ^-bounded. Furthermore, let qiq C a be the edge with q G cr(gi,g2); d(gi,g) > p by 
Observation 3.2. It then follows that (K/2)d(gi, ga) > h therefore d(gi,g 2 ) > 2/i/K. Hence 
N{p) = 0 {{k6/pY) by a straightforward packing argument. This means that the number of white 
cells is |IF| = 0{s{n6 /pY) = 0{n{K5/pY) given that a is a ^-bounded curve. 

We still need to compute N{p) efficiently, that is, to enumerate the set of vertices of a contained 
in B{p,6) for every p G T. This can be done by a spherical range query. As there are no known 
efficient algorithms for spherical range queries, we instead first perform a /^-approximate range query of 
B{p,6) among all vertices from a, such that vertices lying completely inside B{p,6) are guaranteed to 
be retrieved, those completely outside B{p, (1 -|- (3)6) will not be reported, while those in-between may 
or may not be returned. By the same packing argument as above, it is easy to verify that the number 
of vertices returned is still bounded by We then inspect each vertex returned, and only 

mark the corresponding cell in T) white when it indeed lies in B{p, 6). 

We preprocess a into a data structure of size 0(s) = 0{m), using 0{s) preprocessing time, such that 
the resulting data-structure answers /^-approximate range query for B{p,6) in 0(1/(3'^) time. This can 
be easily achieved by constructing a grid of appropriate size (which is (36), throwing the points of a into 
this grid (using hashing). Next, an approximate spherical range query is no more than probing all the 
grid cells that intersects the query ball. The number of cells being probed in a single query is 0(1/(3^- 
Therefore the set of white cells in T) can be computed in 0{r -1- s -F r{K6/p)Y time by choosing (3 > 0 
to be a small constant, say (3 = 1/2. 

Putting everything together, we have an £-fuzzy decision procedure for 5D(7r, cr) that runs in 0{n + 
m + riK^ /e'^) time and space in By Theorem 3.1, we have that: 

Lemma 3.4. An e-approximation of 6'D{'K,a) for a polygonal curve vr and a n-bounded curve a, of size 
n and m respectively, can be computed in 0{{m + nnf /e'^) log(n/£)) time and 0{n + m + nnA/e'^) space 
in d dimensions. 

3.4 Prechet Distance for Protein Backbones 

In molecular biology, it is common to model a protein backbone by a polygonal chain, where each Ca 
atom becomes a vertex, and each edge represents a covalent bond between two consequent amino acids. 
All the bonds have approximately the same bond length, and no two atoms (thus vertices) can get too 
close due to van der Waals interactions. This is the motivation behind the study of the backbone curves, 
which have the following properties: 

PI. For any two non-consecutive vertices u and v of the curve, d(M,n) > 1, 

P2. Every edge of the curve has length I such that ci < I < C 2 , where ci, C 2 > 0 are constants. 

We remark that although proteins lie in three dimensional space, there are simplified models for protein 
backbones in both two and three dimensions, such as the lattice model which has been widely studied 
to understand the mechanism behind protein folding [GIP99, KS94]. 



Now suppose we are given backbone curves vr and a in Given a distance threshold 5 > 0, we 
want to know whether 5D(7r, cr) < 5. We /i-simplify tt and cr to obtain ^ and a as in the previous case, 
for /i = £5/2, and construct the free-space diagram D for tt and a with respect to 5. D is an r x s grid, 
where by Observation 3.2 and property P2, r = \ti\ < C 2 n//i and s = |d| < C 2 m//i. Once D is given, the 
decision problem can be solved in time proportional to \W\, where W is the set of white cells in 2). 

The set of white cells W. A straightforward bound for |iy| is (9(min{r5'^, as by the packing 

argument and property PI, there are at most 0{5^) vertices lying in 5-neighborhood of any vertex of tt 
and a. If 5 < 1, then the number of white cells is 0{n + m). Hence we now assume that 5 > 1. 

We can improve this bound by a more careful counting analysis. Assume without loss of generality 
that r < s. For any vertex p ^ tt and its 5-neighborhood B{p,6), let E{p) be the set of edges of a 
intersecting the ball B{p, 5). The number of vertices of a in B{p, 5) can be upper bounded by 0{\E{p)\). 
Furthermore, given any edge e = (g*, Qi+i) G a, let a{e) = g/^p+i)) (that is, subchain cr(e) C a is 

simplihed into edge e in chain a). E{p) can be partitioned into two sets: (i) Ei = {e E E{p) \ a(e) C 
B(p,6)}, and (ii) E 2 = {e E E{p) \ at least a vertex of a{e) lies outside B{p,6)}. 

By property P2, we know that the number of vertices in a{e) is at least /i/c 2 for any e Ea. Therefore 
|F'i| = 0(c25'^//i). On the other hand, for every edge e G E 2 , there is at least one vertex of a{e) that 
lies in the spherical shell of B{p,S + C 2 ) \ B{p,6), as the length of edges in a is at most C 2 . Since the 
volume of this spherical shell is 0 (c 2 (c 2 + 5)'^“^), the size of E 2 is bounded by 0 ((c 2 (c 2 + 5)'^“^/(c^~^))). 
Therefore, we have that \E{p)\ = |F^i| -|- |F^ 2 | = 0(5'^“^ -|- 5'^//i). Summing it over all r vertices of 
T, we have that |fF| = 0(^(5'^“^ -|- 6'^/fi)). Furthermore, since this number cannot exceed the size 
of H which is 0{rs) = 0{nm/pP)^ we have |hF| = min{nm//i^, 0(^(5'^“^ -|- 5'^//i)}. Note that |hF| is 

maximized when the two balancing terms are equal: that is, when 5 = This implies 

that |hF| = !e^). 

We still need to compute these white cells of T> efficiently. Similar to the case for ^-bounded curves, we 
preprocess a into a data structure of size 0(s), using 0(s) preprocessing time, such that the resulting 
data structure answers /3-approximate range query for B{p,6) in time, for a small constant 

P > 0, say P = 1/2. We then check all vertices returned by this approximate range query, and keep only 
those indeed contained in B{p, 5). Overall, we can compute all white cells in 0{r -|- s -|- |hF|) time and 
space, thus can answer the decision problem “Is 5D(T,d) < 5? ” in the same time and space. Putting 
everything together, we have: 

Lemma 3.5. Given two backbone curves of sizes n and m, respectively, we can develop an e-fuzzy 
decision procedure for 5D(vr, cr) w.r.t. 5 that runs in 0{{n + m) + ^nni}~‘^^‘^) time and space. In 
particular, the time complexity is 0{n -f- m/e^) when d = 2, and 0{n -|- m -f /e^) when 5 = 3. 

Finally, for backbone curves, in order to approximate 5D(vr, cr), one can use a binary search procedure 
(described in Appendix C instead of the approach using WSPD as described earlier. The advantage of 
the binary search procedure is that all results can then be extended for the continuous case 5F(vr, cr) by 
more careful and involved packing arguments. We conclude with the following theorem. 

Theorem 3.6. Given two backbone curves tt and a of n and m vertices respectively, we can compute 
an e-approximation o/5F(vr,cr) in log(nm)) time in two dimensions, and 0{\nm^/^\og{nm)) 

time in three dimensions. 

^In the following, the big O notation sometimes hide factors depending on constants ci and C 2 . 
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4 Pseudo—Output-Sensitive Algorithm 

In this section, given curves n and a of size n and m, respectively, we present a pseudo-output-sensitive 
algorithm for computing (5D(vr, a) for general curves. Although the worst case complexity may still 
be Q{nm), we believe that the observation made within should help to produce efficient (possibly 
approximate) algorithms for Frechet distance in practice. In what follows, we provide results for Loo 
norm (which provides a constant factor approximation for optimal solution under L 2 norm). The time 
complexity for exact computation under L 2 norm is quite messy and omitted. 

Suppose we have an algorithm that answer the following select-distance query in B{N) time: given 
a set of N points P and a rank k, what is RANKrf(A;), the kth smallest distance among all pair-wise 
distances from P. Now given an algorithm to solve the decision problem “Is 5* = 5D(7r, a) < 67” in time 
A{n m), we can hnd the optimal solution 6* in 0{{A{n -|- m) + B[n + m)) log(nm)) time by querying 
RANKrf(A;) among n + m points in a binary search manner^. For Lqo norm, the distance-selection problem 
can be solved in 0{dN \og^~^ N) in [Sal89]. For the decision problem, a straightforward bound for 
time complexity A is |hF| plus the time to compute IF, where W is the set of white cells in the free-space 
diagram D = a, 5) for a threshold 5 > 0. Below we provide a tighter bound for A although its 
worst-case complexity is still Q{nm). 

Switching cells. Given an n x m map D with respect to some threshold 5, a switching cell is a 
white cell whose immediate neighbor above or below it is black. So if D[i,j] is a switching cell, then 
the edge qjQj+i C a (or qjqj^i) intersects the boundary of B{pi,6) exactly once (one endpoint must he 
inside and one must be outside). For a vertex p G vr, while the set of white cells involving p correspond 
those vertices from q falling inside B{p,S), the switching cells involving p correspond to those vertices 
inside B{p, 6) with one incident edge crossing the boundary of B{p, 5). Let § = §(7r, a, 5) denote the set 
of switching cells of D(7r, a, 5). Although in worst case |§| = h2(|IF|) = Q{nm), we expect it to be much 
smaller than |kF| in practice. For example, consider the case when vertices of a form lines of a cubic 
lattice of size x x and 5 is roughly n}!'^ j2. For a vertex p at the center of this cube, the 
number of white cells in the corresponding column in D is 0(n), while the number of switching cells is 
0(^2/3)_ rjpg remaining questions are (i) how to compute the set of switching cells §(7r, a, 5) and (ii) 
how to solve the decision problem once § is given. 

Decision problem with §. Once the set of switching cells is given, we can solve the decision problem 
in 0(|§|) time and space as follows. Instead of representing D explicitly, we now represent each column 
of D, C[i\ for 1 < i < n, as a set of ordered intervals, where each interval corresponds to a maximal 
set of consecutive white cells in this column. Obviously, the endpoints of these intervals are exactly 
the switching cells. Let V[i] be the set of ordered intervals, each representing a maximal set of cells in 
the Ah column reachable from D[l,l], and |C'[z]| and \V[i\\ the number of intervals in C[i] and F[i], 
respectively. Easy to see that \V[i\\ < |C'[z]|, because all cells covered by intervals from V[i] are white, 
and because if any cell c from an interval I G C[i] is in some interval J G C[i], then all cells of I above 
c should also be covered by J. Our algorithm scans D from left to right (i.e, from column 1 to column 
n), and at the Ah round, we compute V[i\ by merging C[i\ and V[i — 1] in 0{\V[i — 1]| -|- |G[i]|) time 
using a merge-sort like procedure (see details in Appendix D). 

Computing §. Given p and h, let §(p, 5) denote the set of edges from a “crossing” the boundary of 
B{p, 5). Here by crossing, we mean that one endpoint of the edge is inside B{p, 6) and one is outside (so 

^In some sense, our previous approach using WSPD is performing implicit approximate distance selection. 
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it is not the usual segment/ball intersection problem). To compute §, we need to perform n edge/bail 
crossing gueries, one for each vertex from vr. Under the L^o norm, the basic operation is in fact an 
edge/cube crossing query, where all cubes are congruent. We can preprocess the set of edges by building 
a range-search tree for their endpoints (similar to the multi-level data structure for orthogonal range 
reporting problem). The entire data structure has size (9(mlog^'^m) and given a cube, the set of edges 
crossing it can be reported in (9(log^'^m -|- k) where k is the number of such edges. 

Putting everything together, we conclude with the following theorem: 

Theorem 4.1. Given two arbitrary polygonal curves vr and a in with n and m vertices, respectively, 
one can compute 5 -d{ti, a) under L^o-norm, in + (n + rn)\o^'^(nrn))\og(nrn)) time and 0{^ + {n + 
m) \o^'^(nm)) space, where <h is an upper bound of the number of switching cells for any threshold 6. 

We remark that for L 2 norm, the running time is 0(<h -|- (n -|- m)^/^ log(nm)) for d = 2 and 0((<h -f 
(n -|- )log(?7,m)) time for d > 2. The edge/bail crossing query required by computing § can 

be converted into an segment/hyperplane query in one dimension higher. It is less practical as the 
solution involves heavy machinery. Nevertheless, if approximation is allowed, one can use the idea 
from Theorem 3.1 as well as multi-dimensional range trees to obtain an e-approximation algorithm in 
0(<h -|- polylog n) time and space where polylog depends on both e and d. 

5 Conclusions and Discussion 

In this paper, we considered the problem of computing discrete Frechet distance between two polygonal 
curves either approximately or exactly. Our main contribution is a simple approximation framework 
that leads to efficient ^-approximation algorithms for two families of common curves: the K-bounded 
curves and the backbone curves. We also consider the exact algorithm for general curves, and proposed a 
pseudo-output-sensitive algorithm by observing that only a subset of the white cells from the free-space 
diagram are necessary for the decision problem. It will be interesting to investigate whether there are 
families of curves that are guaranteed to have small <h, which is the upper bound on the number of 
switching cells. 

We feel that for general curves, it might be hard to develop algorithms that are signihcantly sub¬ 
quadratic in worst case, given that no such algorithm exists for a related and widely studied problem, the 
edit distance for strings. Hence our future directions will focus on practical variants of Frechet distance 
so that one can handle outliers and/or partial matching, or so that one can perform efficient multiple- 
curve alignments. Another important direction is to develop efficient (approximation) algorithm for 
computing smallest Frechet distance under rigid motions (in particular rotations). 

Postscript. Since the appearance of this paper in ESA 2006 [AHK+06] a lot of research was done 
on related problems. Driemel et ah [DHW12] introduced the notion of c-packed curves, and showed a 
near linear time algorithm for such curves. Bringmann [Bril4] proved that Frechet distance can not be 
computed exactly in subquadratic time under the SETH hypothesis. There is more recent research on 
the Frechet distance, but surveying it is outside the scope of this note. 
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A Approximation via fuzzy decision procedure 

Given an /9-fuzzy decision procedure ApprDecision((5, ft) for deciding whether 6* < 6, we combine it 
with the WSPD approach described in Section 3.1 to compute an ^-approximation of 6*. In par¬ 
ticular, construct the 0{n) distances using WSPD as before, and perform a binary search by querying 
ApprDecision(u, 1/10) among these distances to identify the interval J = [x, y] such that ApprDecision(x, 1/10) 
returns “no” while ApprDecision(|/, 1/10) returns “yes”. Easy to verify that a = < 6* < = h. We 

then start with a pair ki = a/(l -|- e:), and = 5/(1 — e:), and perform a standard binary search while 
always maintaining that ApprDecision(/cj, e/4) returns “no” and ApprDecision(fc/j, e/4) returns “yes” until 
kh — ki < {b — a)e/3. It is easy to verify that the invariant holds when we start, and the number of 
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iterations is at most 0(log(l/£)). Furthermore, because of the invariant that we maintain, we have 
(1 — e/A)ki < < (1 + e/A)kh and kh — h < {h — a)e/2 < ^*e/2. It then follows that when e < 1, 

<5* < (1 + e/A)kh < (1 + e/^m + 5*e/2) ^ <5* < (1 + e/A)ki/{l - (1 + e/^)e/2) < (1 + e)h. 

This implies that ki is an ^-approximation of S*. Hence we can use a fuzzy decision procedure to 
approximate S*. 

B Left-inequality of Lemma 3.3 

Let C* be the complete order-preserving correspondence that produce We now modify it 

into a correspondence C between vr and a as follows: First we add all matches to C if 

{piAj) ^ C*. Next, we take each pair of consecutive matches. There are three cases as illustrated in 
Figure 4. The hrst two are symmetric, and we simply add matches {pi^{i),qk) (resp. {pkAi„(j))) into C 




(3) 



■=[> 



Figure 4: Small empty circles mark vertices of T and a. Three cases for two consecutive pairs from C* 
between n and a. For case (3), we hrst add all correspondences between pi and all vertices between q\ 
and q 2 along a, we then add all correspondences between q^ and vertices between pi and p 2 - 


for k G Ia{j + 1)) (resp. k G (/ 7 r(*), + !)))• For the third case, we add all matches of the form 

{Pu(i),qk) for k G and of the form for k G (4 (i),/,,(* +1)) and u = + 

It is easy to verify that the resulting matching M is both complete and order-preserving. Furthermore, 
by triangle inequality, each match (pj, qj) added for the hrst two cases satishes d{pi, qj) < (5d(^, ?) + P] 
while an edge (p^, qu) added in last case satishes 

d(pfc, qu) < d(pfc,Pr4i)) + d(p/^(i), + d(g7^(p, g„) < 5 d(^, 5) + 2p. 

This proves the left-hand inequality in Lemma 3.3. 

C Approximating ^D(7r, a) for backbone curves 

Let ApprDecision(7r, cT, 5, e) denote the £-fuzzy decision procedure for 5D(7r,cT) for two backbone curves 
vr and a. In order to hnd an ^-approximation of 6D{n,a), we can simply use Theorem 3.1. However, 
for this particular case, we can have a much simpler binary search procedure within similar time/space 
complexity that avoids the construction of WSPD. 


14 











In particular, if (5D(vr, a) < f3, for some constant /3, say /3 = 1, then we know that there exist a pair of 
vertices p* G vr and q* E a such that d{p*,q*) = 5D(7r, cr) < f3. We collect the set T of all pairs between 
TT and a with distance smaller than (d. By similar packing argument as in Section 3.4 (in computing 
white cells), |T| = 0{n + m) and we can compute T in the same time/space. We then simply perform 
a binary search among T to locate (p*, q*) and compute (5 d(-P, Q) exactly in 0((n + m) log(nm)) time. 
We call this procedure ExactFSmall(P, Q, £,/?). 

Algorithm ApprFBackbone(P, Q, e) 

begin 

Set So = Sn = 1, yes = 0, no = 0. 
while ( yes == 0 or no == 0 ) do 

if 5 < 1 then 

return ExactFSmall(P, Q, £, 2) 
end if 

if ApprDecision(5„, e:/3) == ‘yes’) then 
set yes = 1, (5o = 6n, A = So/ (1 + e/3) 
else 

set no = 1,= (1 + e/3)So 

end if 
end while 

return So 

end 


Figure 5: Algorithm ApprFBackbone(P, Q, e) computes an e-approximation of ^d(P, Q) for two backbone 
curves. Subroutine ExactFSmall(P, Q, e,/?) computes (5 d(P, Q) exactly if S^{P,Q)</S. 

For the case when ^D(7r,cT) > /3, we perform a different search procedure as described in Figure 5. 
Easy to verify that while loop can be called at most 0(logi_,_£(?7, -|- m)) = 0((log(?7, -|- m))/e) time, 
as obviously (5D(7r, a) < C 2 (n + m) for backbone curves. To see that the output of the algorithm is 
indeed an e-approximation of (5D(7r,cr), observe that when the algorithm terminates, the sequence of 
answers from ApprDecision() is either a sequence of “yes” followed by one “no”, or a sequence of “no” 
followed by one “yes”. Let assume that we have the hrst case (the second is symmetric). Suppose 
the output of the algorithm is 5, then we have that (1 — e/?>)S < S* = SY){7i,a). In the previous 
iteration of the while loop, Sn = ^(1 + s/3), as the answer then was “yes”. Hence we have that 
(5* < (1 -F e/3)Sn = (1 P e/3)‘^S P (1 + e)S if £ < 1. This implies that S ^-approximates S*. Hence 
Theorem 3.6 follows. 


D Decision problem with switching cells § 

The only step unexplained is how to compute V[i] by merging C[i\ and V[i — 1] in 0(|I/[i — 1] | -|- |G[i] |) 
time. This can be achieved by a bottom-up scanning for V[i — 1] and C[i] simultaneously. More 
specihcally, given V[i — 1] and C[i], we can sort the endpoints of their intervals in 0{\V[i — 1]| -|- |C'[i]|) 
time using merge sort. We process them in order and maintain the partial V[i\ at any time. For sake 
of simplicity, we call an endpoint a L-point (resp. H-point) of V[i — 1] or C[i] if it is the low-endpoint 
(resp. higher endpoint) of some interval from V[i — 1] or C[i]. The pseudo-code is shown in Figure 6, 
where V = V[i — 1], C = C[i], and the output is X = I/[i]. It is easy to verify the correctness of the 
algorithm, and the running time is proportional to the sum of interval lists being merged. 
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Algorithm mergeCoiumn(v,G) 

begin 

Set potentialReachFlag = 0 and potentialStartFlag = 0 
Sort iL, the set of endpoints from V and C 
for ( i = 1; i < \H\-, i + +) do 
if ( H[i] is L-point of V ) then 
Set potentialReachFlag = 1 
if ( potentialStartFlag == 1 ) 
then Add H[i] as L-point for X 
else if ( H[i] is H-point of V ) then 
Set potentialReachFlag = 0 
else if ( H[i] is L-point of C ) then 
Set potentialStartFlag = 1 
if ( potentialReachFlag = 1 ) 
then Add H[i] as L-point for X 
else if ( H[i] is H-point of C ) then 
Set potentialStartFlag = 0 
Add H[i] as H-point for X 
end for 

end 

Figure 6: Algorithm to compute V[i] (i.e, X) from V[i — 1] (i.e, V) and C[i] (i.e, C). 
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